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I. INTRODUCTION 


In this very volatile and technological age, it 15 
imperative that communication links and computer memories 
transmit information reliably and quickly. However, in many 
cases this is virtually impossible because noise causes the 
received data to differ significantly from the original 
data. In order to rectify this situation error-correcting 
codes have been developed to enable a system to continually 
maintain a high degree of reliability despite the presence 
of noise. To accomplish the error correction, in addition 
to the data or information bits that are transmitted, some 
additional  redundant-check bits or parity bits are also 
transmitted. In this way, although the noise may introduce 
some errors in either the transmitted data bits ок the 
transmitted check bits, there are usually still enough 
uncorrupted bits available to the receiver to allow a 
sophisticated decoder to correct the errors. In fact, oniy 
a modest amount of redundancy is actually needed to ensure 
that the probability of the decoding error is negligibly 
small. [Ref. 1] 

Nonetheless, unlike the encoders and decoders of the 
1950's and 1960's which were constrained by digital hardware 


costs and virtually nonexistent chip technology, today's 
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encoders and decoders coupled with significant improvements 
in their associated algorithms have become, and will con- 
tinue to be, increasingly attractive from an economic 
viewpoint. 

One such class of error-correcting codes which is very 
popular in the communication circles and 1S paramount in 
this author's discussion of systolic array encoders and 
decoders are the Reed-Solomon (RS) codes. These codes can 
correct both random and burst errors over a communication 
channel, and as such are ideal for the very low error pro- 
babilities needed for reliable space communications. Still, 
the RS codes are only as effective as the complexity of the 
encoder that produces them and the decoder by which errors 
are corrected. rme encoder complexity is directly propor- 
tional to the error-correcting capability of the code, the 
speed of the encoding process, and the interleaving level 
used, i.e., the number of original codewords which аге 
multiplexed together to increase the immunity of codes to 
burst errors [Ref. l]. In fact, for truly reliable space 
communications there is a bonafide need to use RS codes with 
a large error-correcting capability and an equally large 
interleaving level. AS a result, one 15 especially 
interested in decreaSing or minimizing the complexity of an 
RS encoder while simultaneously ensuring maximum performance 
and high reliability. Clearly, what is needed for this type 


of application is a special-purpose system which compliments 


ШЕ! 


the forementioned attributes. Therefore, a systolic array 
is a natural architecture for the simple, regular, and cost- 
effective implementation of an RS encoder and decoder. 

In an effort to assist the reader in simplicity and 
comprehension, this author has taken the pertinent informa- 
tion vital to the thesis and created a chapter for each. 
After systolic arrays are introduced in Chapter ІІ the 
necessary fundamentals of finite fields for an understanding 
of Reed-Solomom codes is discussed in Chapters III апа IV. 
In Chapter V a systolic array multiplier for finite fields 
is discussed and finally in Chapter VI the encoder and 
decoder for binary codes is described as well as the encoder 


and decoder for RS codes. 
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ІІ. SYSTOLIC ARRAYS 


A. BACKGROUND 

It is clear today that developments in microelectronics 
have made a revolutionary impact on computer design 
| КеЁ, 2]. For example, integrated circuit technology has 
made a Significant increase in the number and complexity of 
components that can now fit on a chip or a printed circuit 
board. In fact, with the component density presently 
doubling every one-to-two years, the notion of the million- 
transistor chip will soon be a reality [Ref. 3]. Commen- 
surate with this major increase in chip density is the 
utilization of highly parallel computing structures which, 
almost by definition, implies a basic computational element 
repeated hundreds or thousands of times. This architectural 
style, which has structural properties suitable for VLSI 
implementation, reduces the design problem by several orders 
of magnitude. As a result, we are interested in high- 
performance parallel structures that can be implemented 
directly via very economical hardware devices [Ref. 2]. In 
other words, cost-effectiveness has always been, and will 
continue to be, a major concern in designing special-purpose 
VLSI systems; their cost must be low enough to justify their 
limited applicability. Furthermore, 1Ё a structure сап 


truly be decomposed into a few types of building blocks 
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which are used repetitively with simple interfaces, tremen- 
dous savings can be achieved. 

This is especially true for VLSI designs where a single 
chip usually comprises hundreds of thousands of identical 
components. Clearly, in order to overcome this 
complexity, simple and regular designs are essential. In 
fact, VLSI systems which are based on simple, regular lay- 
outs are very likely to be modular and therefore adjustable 
to various performance levels. Still, with the technological 
indication of a diminishing growth rate for component speed, 
any major improvement in computation speed must come from 
the concurrent use of many processing elements. [Ref. 3] 

The degree of concurrency in a VLSI computing structure 
is largely determined by the underlying algorithm. 
Consequently, massive parallelism can be achieved if the 
algorithm is designed to exploit high degrees of pipelining 
and multiprocessing. For instance, when a large number of 
processing elements work simultaneously, coordination and 
communication become significant--especially with VLSI tech- 
nology where routing costs dominate the power, time, and 
area required to implement a computation. Thus, the 
requirement is to design algorithms that support high 
degrees of concurrency, and at the same time to employ only 
Simple, regular communication and control to ensure effi- 


cient implementation. [Ref. 4] 
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Clearly, what is required is a special-purpose design 
which employs simple and regular communication paths for 
multiprocessor structures in addition to pipelining asa 
general method for utilizing these structures. In short, 
systolic arrays provide a realistic model of computation 
which captures these concepts of pipelining, parallelism, 
and interconnection structures. 

According to Kung and Leiserson [Ref. 2]: 


A systolic array is a collection of relatively simple 
processing units, usually of the same type, which are 
connected together by a simple communication network and 
that operate in parallel, as depicted in Figure 2.1. 
The performance advantage of a systolic array architec- 
ture is that it uses each datum retrieved from memory 
numerous times without having to store and retrieve 
intermediate results, thus allowing significant speedups 
relative to the memory bandwidth. Thus, а systolic 
system is a network of processors which rhythmically 
computes and passes data through the system. The 
analogy 15 to the rhythmic contraction of the heart 
which pulses blood through the circulatory system of the 
body. Each processor in a systolic network сап Бе 
thought of as an element through which multiple streams 
of data are pumped. The regular beating of these 
parallel processors maintains a constant flow of data 
throughout the entire network. As data items are pumped 
through the network some constant-time computation is 
performed and, depending on the operation, updates of 
some of the items may occur. However, unlike the 
closed-loop circulatory system of the body, a systolic 
computing system usually has ports into which inputs 
flow, and ports from which the results of the computa- 
tion are received. Thus, a systolic system can be 
Viewed as a pipelined system--one in which input and 
Output occur with every pulsation. 


As a result, this makes it extremely attractive for a 
wide class of compute-bound computations where multiple 
operations are performed on each data item in a repetitive 


manner. 


ТЭ 


a 


(a) ONE-DIMENSIONAL LINEAR ARRAY 


(c) TWO-DIMENSIONAL SQUARE ARRAY 


(b) TRIANGULAR ARRAY 


(4) TWO-DIMENSIONAL HEXAGONAL ARRAY 


Figure 2.1 Various Systolic Array Configurations 
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B. PRINCIPLE OF OPERATION 

The basic principle of a systolic array is illustrated 
in Figure 2.2. As stated earlier, by replacing a single 
processing element (PE) with an array of processing 
elements, a higher computation throughput can be achieved 
without increasing the memory bandwidth. 


Suppose each processing element in Figure 2.2 operates 
with a clock period of 100 nanoseconds (ns). The con- 
ventional memory-processor organization in Figure 2.2a 
has at most a performance of 5 million operations per 
second (MOPS). With the same clock rate, the systolic 
array processor will result in 30 MOPS performance. 
This gain in processing speed can also be justified with 
the fact that the number of pipeline stages has been 
increased six times in Figure 2.2b. Being able to use 
each input data item a number of times is just one of 
the many advantages of the systolic approach. Other 
advantages include modular expansibility, simple апа 
regular data and control flows, use of simple and 
uniform cells, elimination of global broadcasting, 
limited fan-in and fast response time. [Ref. 3] 


With the above criteria a systolic array is a natural 
architecture for the implementation of an RS encoder and 
decoder which will become apparent after our introduction of 


Reed-Solomon codes in Chapter IV. 
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MEMORY 


(a) THE CONVENTIONAL PROCESSOR 





MEMORY 
ЕЗБЕ 
(b) A SYSTOLIC PROCESSOR ARRAY 


Figure 2.2 The Concept of a Systolic Processore EET 
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Дана а МЫ ТЕ FIELD THEORY 


А. BACKGROUND 

Finite or Galois fields (named after the nineteenth 
century French mathematician Evariste Galois) play many 
important and diverse roles in numerous applications ranging 
from digital signal processing to switching theory. Ном- 
ever, in this thesis ме аге concerned with their use in the 
construction of Reed-Solomon error-correcting codes. We 
begin with a general analysis of the pertinent facts 
regarding finite fields. In the next chapter the necessary 
facts about Reed-Solomon codes are discussed. 

A field is a set of elements, including 0 and 1, апу 
pair of which may be added or multiplied (denoted by + and 
*, respectively) to give a unique result in the field. The 
addition and multiplication are associative and commutative, 
and multiplication distributes over addition in the usual 
way: u* (v+w)=u*vt+u*w. Every field element u has a unique 
negative -u such that u+(-u)=0. Every nonzero field element 
и has a unique reciprocal field element 1/4, such that 
u*(1/u)sl. For every field element ц, O+u=u=l*u, and 0*u=0. 
Thus the numbers 0 and 1 are the additive and multiplicative 
identities, respectively. [Ref. 5] 

The order of a field is the number of elements in the 


field. If the order is infinite, we call the field an 
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infinite field. The rational numbers, the real numbers, and 
the complex numbers are all examples of infinite fields. If 
the number of elements is finite we call the field a finite 
field [Ref. 5]. 

For any prime p and any positive integer m a Galois 
field denoted GF(p™) or GF(q) exists. We can construct a 
field containing p" elements as an algebra of polynomials 
modulo an irreducible polynomial over GF(p) of degree m. 
Addition is bit-by-bit modulo p addition. 

The multiplicative group of the nonzero field elements 
is cyclic, i.e., it is a group that consists of ааа 
powers of one of its elements, 8. Multiplication is defined 
as gl* gj=git) where i+j is computed modulo (pil) and $8 is a 
generator of this group. A generator of this multiplicative 
group, called a primitive element, is a root of an irre- 
ducible polynomial over the prime field GF(p). This 
irreducible polynomial, called a primitive polynomial, із 
the minimal polynomial of the primitive element, i.e., the 
polynomial of least degree with the primitive element в аѕ а 
LOO. Generally speaking, an irreducible polynomial is 
analogous to a prime number: it has no nontrivial factors. 
Lastly, the Galois fields that can be created by taking 
residue or equivalence classes of polynomials modulo an 
irreducible polynomial over GF(p) are said to be fields of 
characteristic p. Thus, GF(pM) is a field of characteristic 


p for each choice of positive integer m [Ref. 6]. 
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В. AN EXAMPLE OF THE CREATION OF A FIELD 

Consider the Galois field GF(24). It has 24 elements 
and may be constructed as the field of polynomials over 
GF(2) modulo the irreducible polynomial l+x+x4. If we let в 
represent a root of l+x+xĉ, them it is also a primitive 
element of the field. Field addition of the elements is 
bit-by-bit modulo 2 addition while multiplication of the 
elements is described using the primitivity of the element 


B. Thus, gl*gj=gitj where i+} 15 reduced modulo 15, if 


necessary. For example, given the field elements 813 and 
89, two of the 15 nonzero field elements listed in 
Table I, we can easily demonstrate both operations: 


g13+4 g9=( g34+ 8241 )4 ( 834:8)-2 834 824 81-82: 8 1-810 while 613% 9- 
gl3+9 =g22=g22-15 28/2934 8-1. 
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TABLE WI 


REPRESENTATION OF GF (27) 


FIELD E 
MM BETA POLYNOMIAL 4-TUPLE 
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ТУ, REED-SOLOMON CODES 


A. BACKGROUND 

Reed-Solomon (RS) codes are Bose-Chaudhur i- Hocquenghem 
(BCH) codes over GF(q) of length ql. They are  error- 
correcting codes which are used in many special-purpose 
applications ranging from deep-space communications and 
spread spectrum to digital audio disk systems and secure 
data transmissions  [Ref. 7]. These codes can correct both 
random and burst errors over a communication channel and 
hence are ideal for the numerous, real-time, and reliable 
applications demanded by these applications. The complexity 
of RS encoders and decoders iS proportional to the error- 
correcting capability of the code, the speed of the decod- 
ing, and the interleaving depth used [Ref. 8]. For truly 
reliable communications there 1S a very strong tendency to 
use RS codes with a large error-correcting capability and an 
equally large interleaving level. Hence, one is especially 
interested in minimizing the complexity of RS encoders and 
decoders for communications and other pertinent applica- 
tions. Toward this end, there 1S a considerable interest in 
systolic array construction and eventual VLSI implementation 
ОҒ RS encoders and decoders which yield significant savings 
in size, weight, and power consumption while simultaneously 


providing high reliability. 
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In this chapter we look at a generic construction and 
architecture of an RS encoder developed by Johl [Ref. 9] 
and use this design as a foundation for the subsequent 
discussion and implementation in the later chapters. This 
implementation utilizes a systolic architecture of identical 
cells arranged in a linear array, each executing a finite- 
field multiplication and addition in a pipelined manner; 
thereby, significantly increasing the throughput rate. 
Also, since the layout of the cell need only be done once 
and then replicated, it is extremely attractive for eventual 


VLSI implementation. 


В. GENERIC ARCHITECTURE 

The RS code is a block code which consists of symbols of 
more than one bit. When each symbol is J-bits wide, an RS 
codeword has (247-1) symbols. As depicted in Figure 4.1, ап 
RS code can be designed to be capable of correcting E errors 
with each codeword consisting of I information symbols, 
together with 2E parity or check symbols. As an example, 
given the irreducible polynomial l+g+p4=0 and its corre- 
sponding finite field as described in Table I we are able to 
establish an important foundation vital to the development 
Of a generic RS encoder. This RS code consists of a total of 
l5-four bit symbols for each codeword. If this particular 
code should correct one error, it would need two parity 


symbols and therefore would contain thirteen information 
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symbols. This representation is known as an RS (15,13) 
code, where the first integer depicts the total number of 
symbols in the codeword, and the second integer indicates 
the number of information symbols. It is the responsibility 
of the encoder to use the information symbols to generate 
the check or parity symbols for the codeword. The informa- 
tion symbols are treated as coefficients of a polynomial 
f(x), 
24-1-2Е 


ЈЕ p 
f(x) = г. х” 1-1 


where f; is the ith transmitted information symbol. ес 


corresponding generator polynomial is known as g(x). 


2Е 


а(х) т (хе ві) 
i=l 


Then, the 2E parity symbols are defined as the coefficients 
of the remainder of f(x)/g(x). Therefore, in the RS (15,13) 
code previously mentioned 
2 
g(x) = | | (хві) 


i=l 


(х+в1) (х+в2) 


хан ш 
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Furthermore, let us assume that the thirteen information 
symbols are 86, ві, 88, 82, 84 вэ , 812, 87, 89, 811, 814, 


83, 813. Then, 


1 


21 х14+Е2х13+ £3 х12ғ, х11+ Ёс х10-Ёс x94 £45 x94 fg х1+ £9 x64 ЕО х? 


+Е11х4+Е]2х3+Е] 3х7 
в6х14+ в1х13+ 8х1 2+ в2х11+ 54х10 + 85 х9 + 812 х8+ 87 х - 89 хб-к 811х? 


+814 х4+ в3х3+ в1 3х2 
Performing the required division, Ё(х)/4(х) 


89,12,60,11,60,10,, oral x T x +0 


31 6 d4 = е 12, 14 4 24013 2 
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412, 
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Hence, the remainder we seek is 814, 0 and thus the corre- 
sponding l5-symbol codeword is 868188 82 g^ g» gl2 g7 g? gll gl4 g3 
813 8120 where the first thirteen symbols represent the 
information symbols and the last two symbols represent the 
parity symbols.  [Ref. 9] 

The architecture of the systolic implementation consists 
of a regular array of identical cells. Division is per- 
formed in a pipelined manner by simultaneously entering the 
highest order of terms of the f(x) and g(x) polynomials on 
the left most cell and generating the appropriate codeword 
on the far right, as depicted in Figure 4.2. In fact, ше 
codeword can immediately follow the previous one without any 
interruption in the pipeline flow. Likewise, the control is 
also systolic. One control bit pipeline path will signal 
the start of a new codeword; another will signal the start 
of the division operation. Meanwhile, each cell of the 
array will hold one term of the quotient. As a result, if d 
represents the difference in degrees between two poly- 


nomials, then 
d=ideq £0.) —deqyacxn: 
and thus d+l cells are required. For example, 


deg f(x) 


! 


14 


| 


deg g(x) 2 


d = 12 (degree of quotient) 
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From our previous calculation, the quotient was (ge9xl24 g6xll 
+80х10+.. .+в814х2+ вЭх+0 }. Since it consists of thirteen 


terms, thirteen cells would be needed. In general, 


deg Ё(х) 24-2 
deg g(x) = 2E 


а = 23-2Е-2 


and so the total number of cells required is d+l or 24-2E-1. 
[КеЕЁ. 9] 

The operation of each cell is simple and regular. 
Essentially, it accomplishes one line of the normal division 
by initially determining the specific term of the quotient, 
multiplying by the divisor, subtracting the result from the 
dividend, and finally passing along the divisor and partial 
result to the next cell. More specifically, there are three 
J-bit data paths and two l-bit control paths, as shown in 
Figure 4.3. Тһе function of the C data path is to allow the 
information symbols to pass through the array unchanged 
while the other two data paths, A and B, are for the 
dividend and divisor, respectively. The register Q is set 
at the start of the division, and remains the same through- 
out the polynomial division of one block. The register B is 
used as a temporary storage device. While a control bit 
accompanies the first byte of information to signal the 
start of a new codeword a preceding start bit, one-half the 


rate of the control bit, initiates the division operation in 
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Figure 4.3 The Systolic Cell Structure 


2L 


each cell. In short, the above architecture is simply a 
pipelined parallel processor which is composed of a systolic 
array of identical cells, each performing a finite-field 
multiplication and addition. Since the layout is simple and 
regular, it is easily replicated and economical to produce. 
[Ref. 9] 

In Chapter VI the encoder and decoder for an RS code are 
described in greater detail with the encoding and decoding 


process carried out for a specific example. 
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У. IMPLEMENTATION THEORY 


A. BACKGROUND 

In this chapter we look at the theoretical concepts 
behind the systolic implementation of an encoder and 
decoder. We then apply these concepts to the actual imple 
mentation in the subsequent chapter. There, the binary case 
is initially presented because of its simple architecture 
and ease of understanding. It is then followed by the more 
intricate and complex Reed-Solomon case. 

We also, in this ара PO, discuss in-depth the design of 
a systolic array multiplier used in the RS encoder. Unlike 
the binary case which deals only with the elements 0 ага 1 
іп the complete codeword, the Reed-Solomon codeword will 
contain symbols which lie in a larger field than GF(2). As 
a result, the systolic array multiplier is increasingly more 
detailed and complicated than in the binary design which 


Simply uses a primitive binary shift register scheme. 


В. PRIMITIVE BINARY SHIFT REGISTER DESIGN 

A primitive binary shift register is a series of regis- 
ters each capable of containing a zero ог а опе. The 
contents of the register all shift on a designated time 
Signal via use of an external clock. The contents of the 


newest stage of the register is defined as a function of the 
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current contents of the register. Because these shift 
registers utilize this feedback property they are commonly 
referred to as feedback shift registers or primitive shift 
registers since the feedback is usually described by a 
primitive polynomial [Ref. 10]. For example, the diagram in 
Figure 5.1 describes a primitive shift register composed of 
four registers, labeled l, x, x2, x? and one modulo 2 adder 
Situated between registers 1 апа x. Each register is 
capable of storing one bit of binary information, 1.е., а 
Sen Olea е The all zero contents of the register is 
typically prohibited. This restriction is placed on the 
primitive shift register to ensure a change of state when a 
new clock signal is received. The register is allowed to 
step from state to state, therefore the length of a primi- 
tive cycle is independent of its initial state and is equal 
со 2111. The primitive shift register of Figure 5.1 will 
move through 15 distinct binary patterns before repeat 
(see Table II). This primitive shift register is said to 
have a cycle length of 24-1 or 15. Moreover, since all 
nonzero patterns are included in the cycle, it is called a 
maximum-length cycle. In general, a primitive shift regis- 
ter composed of m stages will generate a maximum-length 
cycle of period 2!-1. It is possible for each value of m to 
determine a primitive feedback function for the shift 
register so that a maximum-length shift register sequence of 


period 2!/-1 is generated. 
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TABLE II 


REGISTER CONTENTS AFTER SUCCESSIVE CLOCK SIGNALS 


10 
11 
12 
13 
14 


15 


REGISTER CONTENTS 
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BETA POLYNOMIAL 


Maximum-length cycles and maximum-length sequences have 
broad applications in data communication systems and com 
puter simulation while primitive shift registers designed as 
division circuits have applications іп coding theory 
[Ref. 10]. It is the objective of this chapter to utilize 
the concepts of the latter to propose an RS encoder and 
decoder. 

In order to generate a maximum-length cycle or sequence 
we need to understand the necessary component connections 
given a primitive polynomial. That is, given an arbitrary 
primitive polynomial, how do we design the shift register? 
For the example of Figure 5.1, assume p(x)=l+x+x4 is a 
primitive polynomial over GF(2). We can consider GF(24) as 
an algebra of polynomials modulo p(x)=l+x+x4 and design a 
register to produce a pattern cycle of length 24-1. Using 
four delay units (Since we need a register unit for the 
coefficient of each term x® with 0<t<3) we need only decide 
how the primitive polynomial affects the feedback to know 
where to place the modulo 2 adder components and where to 
make the necesSary circuit connections. The feedback is the 
coefficient of x4, but in this polynomial algebra x4=1l+x. 
Thus, the feedback goes to the registers which contain the 
coefficients of the x9 and x! terms. Making these соппес- 
tions and supplying the modulo 2 adder component where we 
have two inputs to the register, we arrive at the shift 


register given Sigue 5l Then each step of the 
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register 1S equivalent to multiplying the contents of the 
register by the primitive element в. Thus, the sequence of 
contents are the powers of g modulo (ДЕ аза In this way 
multiplication of the elements of the field is produced 
Simply as described in Chapter III and the powers of в аге 


as given as in Table I of Chapter III. 


C. CODING THEORY 

Suppose that we wish to transmit a sequence of binary 
digits across a noisy channel. If we send a one a one will 
probably be received and if we send a zero a zero will also 
probably be received. Occasionally, the channel noise will 
cause a transmitted one to be received as a zero or a trans 
mitted zero to be received as a one. Although we are unable 
to prevent the channel from generating such errors, we can 
reduce their undesirable effects with the use of coding 
кеш 5] The basic idea is simple. A set of k message 
digits which we wish to transmit is concatenated to xr check 
digits. Тһе entire block of n=k+r channel digits then forms 
the transmitted codeword. Assuming that the channel noise 
changes sufficiently few of these n transmitted channel 
digits, the redundancy afforded by r check digits provides 
the receiver with sufficient information to detect and 
correct the channel errors. Figure 5.2 illustrates the 
basic idea of the encoding process for an (n,k) encoder with 


п-15 апа К-11. The codeword 1S constructed in such a way 
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that the message digits appear at the far right. The error 
correcting capability of the generated code depends upon the 
number of check bits added. To illustrate, the binary code 
constructed uSing the encoder of Figure 5.2 1s capable of 
correcting one error when, for example, n=2%], k=2%™Jl1-m for 
each integer m > 2, the so called Hamming single-error- 


correcting code. 


р, MINIMAL POLYNOMIALS 

In order for a code to correct every pattern of t or 
fewer channel errors, the codewords must be generated by a 
polynomial whose length 15 the product of at least Е 
distinct minimal polynomials [Ref. 5]. Occasionally, extra 
error correcting capability 1S possessed by words of 
a code beyond the designed capacity of the code. To 
understand this situation and the general error correcting 
capacity of the code, it is necessary that we discuss 
some of the mathematical concepts and properties that 
comprise minimal polynomials before discussing the actual 
implementation. 

A minimal polynomial for a primitive element 8 over 
GF(p) is the lowest degree irreducible monic (has leading 
coefficient 1) polynomial M(x) with coefficients from GF(p) 
such that M(g)=0 [Ref. 11]. For example, the Galois field 
GF(24) is constructed using the primitive element  $, the 


root of the irreducible polynomial 1+x+x4. Then the minimal 
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polynomials for the elements 8, 83, Ф?, and g’ are given in 


the following table. 


Element Minimal Polynomial 
0 X 
il При 
8 l+x+x4 
83 1+х+х2+х3+х4 
85 1+х+х2 
87 1+х3+х4 


Furthermore, in  GF(2m) gi and g2 i have the same minimal 
polynomial. In general, if gl is a root of a minimal poly- 
nomial then so is gP! (where p is the characteristic of 
the ground field GF(p); in this case p=2). To illustrate, 
let us substitute the elements g and g¢ into our minimal 
polynomial l+x+x4. Upon substitution of в we obtain IER eim 
Thus in GF(24) g4=g+1 and M(8)s0. Likewise, upon  sub- 
stituting g2 for x in the same minimal polynomial we obtain 
1382-88, which in GF(24) is also zero аз сап Бе seen in 
Table I of Chapter III. Elements of the field with the same 
minimal polynomial are called conjugates. In the same way, 
the imaginary roots i апа -і are referred to as conjugate 
complex numbers-- they both have the same minimal polynomial 
x?-l over the reals [Ref. 11]. 

From our preceding discussion, it is clear that в, 02, 


(82)2284, .(84)2289 all have the same minimal polynomial 
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l+x+x4, Likewise 83, 86, 812, g24=,9 also have the same 
minimal polynomial 1+x+x2+x3+x4. We see that the powers of 
8 fall into disjoint sets, called cyclotomic cosets. In 
fact, all в] which are elements of the same cyclotomic coset 
have the same minimal polynomial. The cyclotomic coset 


containing 85 consists of the following powers of В: 


(8, 28, 248, 238, ..., 2 


m 
where mg is the smallest positive integer such that 2 “sae 


s(mod 27-1) [Ref. 11]. For example, the cyclotomic cosets 


over GF(24) are: 


Со = {0} 

С] = {1,2,4,8} 
Сз = {3,6,12,9 } 
Cs = (5,10) 


Су = (7,14,13,11) 


Other cyclotomic coset decompositions for various values of 
m are listed in Table III. 

If we let M(i)(x) represent the minimal polynomial of 
gleGF(pm), it follows that if i is in the cyclotomic coset 
Сс, then 


M(i)(x) = | | 


) ЄСа 
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TABLE III 


CYCLOTOMIC COSETS 


ОУЕВ СЕ (23) OVER бЕ (29) 

Co = {0} Со = {0} 

рі Кран А Е! 

Сз = (3,6,5) ео, 24,17 
Cs = (5,10,20,9,18) 
СИТО 1281225, 19) 
СР І 02 226,21) 
СИЕ 15/20 0727,23} 


OVER GF(29) 
Ср = {0} 
Со - 12540798. 16,32] 


са = {3,6,12,24,48,33} 
Сс з (5,10,20,40,17,34) 
Су = (7,14,28,56,49,35) 
Come = 919718, 361) 


но 2 44:25, 50, 37 
Е 26524118258) 
с. — 05 3060. 57 , 5089 ) 
Co, = {21,42} 

Сэз 7 (23,46,29,58,53,43) 


C93 = {27,54,45} 


C31 7 (31,62,61,59,55,47] 
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which is analogous to the generator polynomial g(x) in our 
generic architecture of the previous chapter. Moreover, by 
utilizing various techniques beyond the scope of this 
thesis, we may determine all the minimal polynomials of 
elements in GF(24), as depicted in Table IV. Using this 
table we may construct all the Reed-Solomon codes of block 
length 15 which correct t or fewer channel errors. These 


codes have the following generator polynomials: 


t-l g(x)=M(1) (x)=1+x+x4 
t=2 g(x)=M(1) (x) *M(3) (x) 14x44 504x748 
t=3 а(х)=Мм(1)(х)*м(3) (х)*М(5 ) =1+х+х2+х4+х2+х8+х10 


Hence, the teerror correcting RS code of block length п 15 
then the cyclic code whose generator polynomial is the 
product of the distinct minimal polynomials ої 8, 82, "ROB 
си) ВР Е Ш 92Е | веб е Of noteworthy interest is the 
fact that an RS code over GF(22) which is designed to 
correct up to 4 errors is also able to correct 5 errors. 
This is because M(?)(x), the minimal polynomial of?" ue 
identical to M(5)(x), the minimal polynomial of №. Simi- 
larly, the 6 error-correcting RS code is identical to the 7 
error-correcting code just as the 8-to-14 error correcting 
codes of length 31 are all identical to the 15  error- 
correcting code. In a similar way, codes over СЕ(26) and 
GF(27) are sometimes able to correct more errors than they 


are designed to correct. The ability to correct these extra 
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TABLE IV 


MINIMAL POLYNOMIALS OF ELEMENTS IN GF(24) 


м(1)(х) = М(2)(х) = M(4)(x) = M(8)(x) з 1+х+х4 


M(3)(x) = M(6) (x) М(12)(х) -М(9)(х) = 1+х+х2+х3+х4 


М(5)(х) = М(10)(х) = 1+х+х2 


М(7)(х) - М(14)(х) - М(13)(х) - м(11)(х) = 1+х3+х4 
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error patterns depends upon finding higher powers of g which 
belong to cyclotomic cosets for the smaller powers of 8 
which belong to the code for the designed error correcting 
distance. The tables of cyclotomic cosets for СЕ(2° 
GF(29) show that 69 belongs ко 82, 617 belongs to № апа ві? 
belongs to аа See [Ref. 11] for further discussion 
of the error correcting capabilities of given error 


correcting codes. 


Es SYSTOLIC ARRAY MULTIPLIER 

As mentioned earlier in this chapter, the systolic array 
multiplier used in the generation of  Reed-Solomon codewords 
is much more complex than in the binary case. In thie 
section, we discuss the design of a systolic array multi- 
plier developed by Yeh, Reed, and Truong [Ref. 7] to assist 
us in our implementation of an RS encoder. 

According to [Ref. 7] several circuits have been pro- 
posed to realize multiplication in GF(2M). Unfortunately, 
these circuits are not suited for use in VLSI systems, due 
to irregular wire routing, complicated control problems, 
nonmodular structure заа lack of concurrency. The systolic 
array multiplier of [Ref. 7] performs the multiplication in 
the field GF(2™) which overcomes some of these unwanted 
attributes. 

The systolic architecture is developed for performing 


the product-sum computation, AB+C, in the finite field 
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GF(2™) of 2" elements, where A, В, апа С are arbitrary 
elements of the field. The multiplier із а serial-in, 
serial-out, one-dimensional systolic array which requires m 
basic cells. To perform an isolated computation the multi- 
plier requires 3m time units, however, the average time per 
computation is only m time units if a number of computations 
are carried out consecutively. Because the architecture is 
simple and regular and possesses the desirable properties of 
concurrency and modularity, it is well suited for VLSI 
implementation. [Ref. 7] 

Consider the nonzero elements of GF(2™), They can be 
represented as the powers of 8, а primitive element of 
the field as discussed in Chapter III. Since 5 її 
BM=f -187-1-...4 8-8 0, where the coefficients fj асе 
determined by the polynomial f(x) which В satisfies. 
Therefore an element ОЕ GF(2™) is ОЁ the form 
Am-] BM 1+...+a, Btag where ај є СЕ(2) for 0 < i < ml. In 
the following discussion, the polynomial representation is 
used to represent the finite field GF(2™)., 

Let A=ap_1 pM 1+...+a) Btag and B=b,_] pM 1+...+b) Bebo be 
two elements іп GF(2™), Then | A-BsSq, g^ 1o... 81 B Sg, 
Where Sj=aj+bj (mod 2) for O < i < ml. therefore addition 
in GF(24) is realized easily by m independent Exclusive- OR 
gates. 

Suppose P=pm-1 B% l+.. .+p] B8+Po is the product of A and B, 


1.е., Р=АВ. Then P can be written as follows: 
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m-l m-1l п І 


р = > тг › ( > а АШЫ. 


k=0 k=0 n=0 


|| 
f» 
5 
~ 
б 
~ 
Uo 
J 


(1) 


where ац (К) is the coefficient оғ йіп АРС, іе.) АБ 


am-1 (5K) 8ml+...+a](K)g+ao(K) for 0 < k < m-l. 


From equation 


(1) we obtain рргад (УЈБбо+ар (1)51+.. ац (2) эа (ЬЫ. 


The computation of АВК сап Бе performed recursively on k 


for 0 < k < m-l. Initially for k=0, Ag9=A, i.e., a,(%Q-=a, 


tor 0 < nxx ml. Bor l <k k mii 


m-1 
= > - + 
Ав = (Ав T аа 1) ntl 
n=0 
m-1 
B am > (k-l) n 
З0-1 ы а ай р 
n=l 


Substituting g™=£,_)eM-1+...4+£] 8580 into 


yields 


zi E 
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(2) 


equation (2), 


(кый 


і - 


From equation (3), we obtain 


ся) 


аһ-1(К-1) + ар] (К-1) р for 1 <n < ml 

(4) 
аў (К) = apy (K-1) £ 

Table V indicates the step-by-step procedure for comput- 

ings) P=AB+C in GH(21). тп Table Vv a,(*), bn, cn, fn, and 

Py are the mth bits of АВК, В, С, F, and P, respectively, 

where F is the primitive polynomial and pu (i) is the partial 

sum of pg. 

Figure 5.3 depicts the systolic multiplier for our given 
finite field. The primitive polynomial is F=f} 83+f3 82+f] В+ 
fg. Шарисс  гссстгосс tice bit Юр of Be The n-th bits cy, 
ап and fp, ОЕ С, А, апа ЕЁ, respectively, are received 
serially at inputs eg, ду, апа По. Two control signals, 
START (0001) and END (0111) are used in the design with 
шіршісе ыо апа со receiving the signals, respectively. 

Output ед serially transmits the n-th bit, рд, of the 
result P out of the system. The order of the inputs and the 
outputs is also shown in Figure 5.3. The flip-flops (FF) 
associated with inputs tg and hg are used for the purpose of 
synchronization. 

mhe circúit Tof cell L; is shown in Figure. 5.4. The 
Operation of the flip-flops in the system is synchronized 
implicitly by a clock signal. When rj*=1, uj=g;* at the 


next time unit (through switch SW). Additionally, when 
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TABLE V 


COMPUTATION OF P = AB + C IN GF(24) 


STEP 
NUMBER OPERATIONS 


| рз (9) = сз a OEN 


рз (1) = рз (0 )+аз (0 )bg , a2 (0) = а 


р» (9) = C2 


! р» (1) = р» (9 )+а> (0 )bg , aj (0) = ај 
Р] (0) = С] аз (1) = ag (0 )+аз (0225 
рз (2) = p3 (0 жаз (1251, ag (0) = ag 


ро (9) = CQ 


| р2(2) = р2(1)+а>(1)Ь1, aj (1) = ag (00«a4 (0) £i 
po (1) = py (9 tag (9 by, аз (2) = аз (1)+аз (1) ғ 
| рз(3) = р3(2)+аз (2)52, ag (1) = a3(9) £5 
р1(2) = р1 (1)+а1 (1)5], а2(2) = а) (1 )ља(1)5 
| ро (3) 2» p5(2)«a5(2)b5, аі (2) » ag(1)«a4 (128 
ро 12) з ру (Іда (175, аз (3) з аз (2)ч аз (2) Ез 
рз = рз(4) з рз(3)чаз (3)ь3, ад2) з аз (1)5 
р2 = pol4) = pol3)4+an(3)b3, az (3) = ag (2)4a3 (2) £1 
| po (3) = py (2 )4+aq (2) bo, 
10 рү = py (4) = р1(3)+а1(3)63, ag 3) = аз (2) ғ; 
al Po = Po (4) = py (3)4+aq (3 )b3, 
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ri =0, и; retains its value. Two principle operations of 


the system are the following: 
* 
еј] <---- (41:41) Ә еі 
дікі” <--- (ч1һ17) (31 51) 


where 0 < і < 3, @ депобеѕ Ехс1цѕіуе- ОК орегабіоп, і.е., 
modulo-2 addition, and the backwards arrow denotes the sub- 
stitution operation. 

A comparison of the procedure in Table V and the 
structure in Figures 5.3 and 5.4 yields the following facts: 
The signal uj in Lj is equal бо аз (1) іп Agi. The signal 
gi" is equal to аг (1) in Agi for some n. The signal e;* is 
equal to the partial sum AB+C. 

The multiplier in Figure 5.3 can be generalized to the 
finite field GF(2™) by simply concatenating m identical 
cells. Furthermore, additional registers and control sig- 
nals would be required if the bj's are fed serially into the 


System in the same manner as the аі'5.  [Ref. 7] 
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VI. IMPLEMENTATION 


A. BINARY ENCODER 
In this section we discuss the encoding process for a 
binary code and utilize a primitive shift register design to 
implement both a single-error-correcting binary encoder and 
a double-error-correcting binary encoder. 
1. Encoding Process 
As discussed in Chapter V, an (n,k) code can be 
generated with a polynomial of degree n-k. ТЕ the poly- 
nomial is primitive of degree r and nz2t-1, the code can be 
encoded and decoded with primitive shift registers. Hence, 
we restrict our attention solely to the case of primitive 
polynomials. 
We illustrate this procedure by generating the 
(15,11) binary code using the primitive polynomial p(x)= 
l+x+x4. Неке п-К-4, r=4, n=24-1=15, and К=11. Тһе encoding 
process for the ll-bit message 10101010101 proceeds as in 


the example below. 


Example of Encoding Process: 
Message = 10101010101 


l) Represent the message m(x)=l+x2+x4+x64+x34 x10 
as a polynomial. 


2) Multiply m(x) by xn-k x4m(x)=x44 x64 x84 104x124 414 


to shift the message 
digits to the far right. 
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3) Calculate the remainder r(x)=l+x+x3 
when x™kKm(x) is 
divided by p(x). 


4) Form the code c(x)=l+x+x34+ x44 x64 x8 
polynomial as the sum +х19+х12+х14 
xl-Xm(x)er(x), а 
multiple of p(x). 


Code Word = 110110101010101 

Note that codewords in this code are formed as multiples of 
the primitive generating polynomial p(x). As p(x) is of 
degree r there are mr=k information symbols which сап be 
chosen freely and then r check symbols are chosen so that 
the resulting codeword satisfies this criteria, namely that 
the codewords are multiples of the generator polynomial. In 
other words, the check digits are the coefficients of the 
remainder r(x) upon division of xn-Km(x) by p(x) as shown 
below. 


х10+х8+х7+х5+х4+х3+1 


ЖІ x14, x12, 410, «8, 49, 


13, 11, 19 


dae 


d TX 


хіх 


хіх 


+х° 


+x! 


7 


Х +х9+х +X +X 


МО МОЈ со Wl ао со 


х + +x +x 


coax +x nox 


Ul Ul QJ o 
Ф xa л e 


xS +x +X 


x tx +x? 
X +x 
4 
х +х+] 
х +х+] 
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2. біпдіе- Есгокс-Соггесставвттпасу Есова 


By utilizing the previously discussed concepts, we 
may now describe the encoding process of the binary (15,11) 
code as implemented ina primitive shift register shown in 
Figure 6.1. By simply feeding in the message m(x) at the 
xÁ-stage we are able to simulate the effect of multiplying 
m(x) by x4. The switch remains in position 1 as m(x) is fed 
completely into the shift register. The shift register 
computes the remainder when x4m(x) is divided by p(x) as the 
shift register is in essence a division circuit. The 
register contents after the information bits have all been 
fed into the register is the remainder after division of the 
information polynomial by the generator polynomial p(x). In 
the example the remainder is 1101l=l+x+x3. The switch is 
then changed to position 2 to allow the check digits to 
follow the message digits producing the coded output 
110110101010101 for the example given. [Ref. 10] 

3.  Double-Error-Correcting Binary Encoder 

TO design a double-error-correcting binary encoder 
to correct up to two errors, additional redundancy must be 
added. Since we are now concerned with correction of up to 
two errors the generator polynomial is the product of the 
two distinct minimal polynomials М(1)(х) and М(3)(х) аз 
described in the previous chapter. Their product is the 
polynomial 1+x4+x6+x/+x8. The implementation of the encoder 


is carried out in essentially the same manner as its 
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t = 2 D TED 
t = 3 В 1 
t = 4 ТКШ L 
& = 5 ОО. 
t = 6 SWITCH IN ВТ. О 
Е = 7 DO SPIRI эс је 
t = 8 qos CT 
t = 9 ес? 
t = 10 00 I 0 
t = 11 ТОТ 
E= 12 OTITO 
t = 13 SWITCH IN тор 
t = 14 DOSTTTON 2 иг. ak 
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A Single-Error-Correcting Binary Encoder 
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Single-error counterpart. The encoder 15 presented in 
Figure 6.2. Now n=15 and k=15-8=7 so that there area 
smaller number of codewords (27) in this more powerful code. 
As the error correcting capability of the code increases, 


the number of information bits correspondingly decreases. 


В. REED-SOLOMON ENCODER 

In this section we draw upon the work of Liu [Веё. 8] 
and our acquired knowledge of finite field theory and Reed- 
Solomon codes to produce an RS encoder. 

As discussed in Chapter IV, an RS codeword has (24-1) 
symbols each of which is J-bits wide. Of the (24-1) symbols 
there are (2J-1-2E) information symbols and 2E parity-check 
symbols, where E is the number of symbol-errors the RS code 
is able to correct. If we treat the (2J-1-2E) information 


symbols as the coefficients of the polynomial 


24-1-2Е 
J J J 
f(x) = > D. 7171. ех" ша e М 7 х2Е 
е 2 -1-2Е 


then the 2Е parity-check symbols can be obtained as the 


coefficients of the remainder of f(x)/g(x) where g(x) is the 


generator polynomial of the code. Usually, g(x) is defined 
as 
ЖЕ 2E 
g(x) 444 (хөв) = Хай 
151 1-0 


(х)о 


лэрория Алечта битаооллод-лолла-етдпод Ч 2'9 елабтя 


м С б. 
| Hel а Аа FE FAL 
nS A | х П 


(х) ш 
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where g is a primitive element of the Galois field СЕ(2У) 
and 4: 5 are the coefficients of g(x) with др]. 

A diagram of the RS encoder which generates the 
remainder of f(x)/g(x) is given in Figure 6.3. ЇЕ = 
composed of 2E systolic array multipliers, 2E "exclusive-or" 
adders, and 2E shift registers. The coefficients of the 
generator polynomial g(x) are fed into their respective 
systolic multipliers where the finite field multiplication 
A*B occurs, as discussed in Chapter V. Upon completion the 
partial product is "exclusive-or'ed" with the contents of C 
of the previous shift register and distributed down the line 
^to the next shift register in a pipeline fashion. The 
Switches are normally in the "ON" position until the last 
information symbol goes into the encoder. At this moment 
all the switches are turned to the "OFF" position and the 
encoder behaves like a long shift register. The output of 
the encoder is then taken from the output of the last shift 


register. [Ref. 8] 


С. BINARY DECODER 

In this section we discuss the decoding process and 
design a single-error-correcting binary decoder and a 
double-error-correcting binary decoder both of which can be 
used in conjunction with the binary encoders of Section A of 


this chapter. 
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1, Decoding Process 


The decoding process is, іп general, much more 
complicated than the encoding process. Not only must we 
deal with the detection of errors but also with their 
Correction., As a result, we must be able to design a 
decoder which simultaneously detects and corrects errors. 

Error detection is usually much easier than error 
correction. Recall that a code polynomial is a multiple of 
the generating polynomial p(x). In other words, the 
received polynomial u(x) will be a code polynomial if and 
only if the remainder upon division of u(x) by p(x) is zero, 
iesp U(x) = 0 modulo p(x): An example 15 given іп 
Table VI. The register contents after u(x) is fed com- 
pletely into the detecting division register will contain 
ü(x) modulom p(x]; If any of the register contents are 
nonzero, u(x) is not a valid codeword. Thus the shift 
register acts as an error detector by performing a division 
ОЕ uox) Db cC X) In fact, the nonzero contents not only 
indicate that an error has occurred in transmission, but 
those contents also indicate the error pattern needed to 
correct the error and the location of the error in the 
transmitted codeword.  [Ref. 10] 

2. Single-Error-Correcting Binary Decoder 

Because of the complexity of the decoding process, 

we will initially design an error detection register 


followed by its error correction counterpart апа then 
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TABLE VI 


VERIFICATION OF THE CODE POLYNOMIAL 
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synthesize them together to implement the complete decoder. 
To begin, we utilize the error detection register of Figure 
6.4. It is identical to the encoding register of Figure 6.1 
except that the received codeword is input to the decoder at 
the left end of the register. If the received word is 
110111101010101, then the nonzero contents 0110 after 
division indicate that ап error has occurred in trans- 
mission. In order to correct the received word we need to 
know the error position. 

The received word can be viewed as a polynomial u(x) 
which can be written as the sum of the code polynomial с(х) 
and an error polynomial e(x), namely u(x) = c(x) + e(x). 
The error polynomial e(x) has ones in its error positions 
and zeros elsewhere, and addition is term by term modulo 2. 

Since the codewords c(x) are generated as multiples 
of the generator polynomial g(x) and since gis a_ root of 
g(x), the code polynomials evaluated at g are equal to zero, 
namely с(8) = 0. Thus u(g) = c(g) + е(в) = е(в). Since we 
assume in this sub-section that only single errors have 
occurred in transmission we can also assume that if an error 
occurs then e(x) is a power of Xx, say е(х) = xi for some i. 
Thus u(8) = e(8) = ві. 

In order to correct the error we need to compute 
u(g) which is called the syndrome of the received word and 
then find the specific value i for which u(8) = gl, The 


value i will indicate the error position. We need then only 
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Figure 6.4 The Error Detection Register 
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0 


0 


0 


ERROR 


set c(x) = u(x) F хі to obtain the code polynomial Сс(х) а 
multiple of p(x) which is "nearest" to the received poly- 
nomial u(x). The primitive shift register facilitates this 
task because while it is computing u(x) modulo p(x) it also 
leaves the coefficients of u(8) з 8і іп the shift register. 
[Ref. 10] 

For example, in Figure 6.5 the primitive shift reg- 
ister computes ч(х)=1+х+х3+х4+х9+х6+х8+х10+х12+х14 modulo 
p(x)=l+x+x4 and the syndrome is 0110 = xtx2, Note from 
Table I of Chapter III that 0110 is the 4-digit represen- 
tation of 82. Hence the error іп the received polynomial 
occurs in the position of x». Therefore, the code poly- 
nomial is с(х) = и(х)+е(х)=(1+х+х3+х4+х2+х6+х8+х10+х12+х14) 
+х? = 1+х+х3+х4+х6+х8+х10+х12+х14, The corrected codeword 
is 110110101010101 and the corrected information symbols are 
LOLOLOLOLOL. The same procedure із also illustrated in 
Table VII by the actual long division process. 

We now examine the primitive shift register decoding 
process which performs the error correction. After the 
syndrome 15 computed by the primitive shift register 
division process, an additional primitive shift register of 
the same type can be used to correct the error without 
reference to a table of powers of the primitive element 8. 
The correcting register shown in Figure 6.6 is basically the 
same primitive shift register used throughout this chapter 


with the exception that there are output lines leading from 
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TABLE VII 


SYNDROME CALCULATION USING LONG DIVISION 
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each of the four registers. If the correcting register 
is set initially at 0100, the 4-digit representation for 
the element $8, then, as it shifts, the output is the same 
cycle as the 4-digit representation listing of NES 
81(i21,2,3,...,15) in Table I since a shift in the primitive 
shift register is the same as multiplication by  g. No 
matter which state the register is set to initially the 
correcting register will output elements of that maximum- 
cycle in the same cycle order as long as the register 
continues to shift. If the register 1S set at 217 it will 
be in state 8l*J after 7 shifts. [Ref. 10] 

| Figure 6.7 (the complete single-error-correcting 
binary decoder) shows the received word of our example, 
namely 110111101010101 whose polynomial form is l+x+x3+x4+x9 
+х6+х8+х10+х12+х14 jin a storage register and the syndrome 
0110 іп the correcting register. From our previous dis- 
cussion we know that the error occurs in us. Thus, if the 
detector register has output l as ug leaves the storage 
register and (0 otherwise, the word 110111101010101 will be 
corrected after fifteen shifts to read 110110101010101. We 
illustrate how the correcting register is used to accomplish 
this task by listing the new states of the correcting 
register, and the outputs from the storage register and 
decoder after each shift. The states are depicted іп 


Table VIII. 
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RECEIVED WORD | x х х 
р 24 Ф 3 pm mí. 
1 Ug 
E 
0 ч, 
1 1: 
1 Uy 
| ЩЕ Чо 
E че 
| 0 | ШЕ 
ШЕТІ. 
0 | ug 
Lj uio 
мг DETECTOR FOR 
STATE 1000 
1 01, 
| 213 
“ид 
CODE WORD 


Figure 6.7 The Complete Single-Error-Correcting Binary Decoder 
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ЕО Шарпе” thatthe incorrect digit 05 
leaves the storage register when the correcting register is 
in state 1000. If the detector is made to produce an output 
l when it detects 1000 and 0 otherwise, then us will be 
properly corrected. In general, if the syndrome is gl, then 
the error occurs in the coefficient of хі, namely uj, where 


the received polynomial has the form 


med 
u(x) = > ціхі 
і=0 


If J із such that un-j = uj, then uj leaves the storage 
register when the correcting register is in state gltj ав 


shown below: 


State Output 
віта Un-2 
git] Un-j7uj 


Since gitjsgnzl, the detector will correct the digit uj and 
the received word will be corrected to the nearest code word 


after the decoder completes this process.  [Ref. 10] 


ШӘ 


To recapitulate the correction process, the detect- 
ing register computes the syndrome of the received word. As 
each digit of u(x) enters the detecting register it simul- 
taneously enters the storage register. When the syndrome is 
determined, it is transferred to the correcting register for 
the error-correcting procedure just described. 

3. Double-Error-Correcting Binary Decoder 

TO implement a double-error-correcting binary 
decoder we begin with a general analysis of the three stages 
that comprise decoding. The first stage is the Syndrome 
Generator stage. The syndrome is defined as the nonzero 
remainder of the received polynomial when it is divided by 
the given primitive shift register. The second stage or the 
Central Galois Field Processor finds the error locator 
polynomial c(z) (usually accomplished by using Berlekamp's 
iterative algorithm or Massey's linear feedback shift 
register synthesis algorithm). At this stage the polynomial 
is determined which defines the location of the errors that 
have occurred in transmission. Finally, the third зө асе ОЕ 
the Chien Searcher stage finds the roots of o(z) to deter- 
mine which digits should be corrected. Note, in the binary 
code, correction is trivial when the location of the errors 
is determined, i.e., the bit in error need only be 
complemented. [Ref. 11] 

Using our previous double-error-correcting generator 


polynomial 1+x4+x6+x/+x8, | which is the product of 
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(1+х+х4) (1+х+х2+х3+х4), we are able to produce Stage I of 
the decoding process as illustrated by the division process 
in Figure 6.8. Similarly, we are also able бо produce 
Stages II and III (Figures 6.9 and 6.10, respectively) along 
with a block diagram of the complete decoder in Figure 
11. 

The operation of the decoder is relatively straight 
forward as in the previous’ section. Utilizing a buffer 
capable of storing 2n digits, the Chien Searcher is in the 
process of computing o(z) in order to determine whether or 
not the next digit to leave the buffer should be corrected. 
The Syndrome Generator at the same time computes’ the 
syndrome of the received word while the Central Galois Field 
Processor finds the error-locator polynomial for the 
buffered word. Once the coefficients of the error-locator 
polynomial are read out of the Central Galois Field 
Processor and into the Chien Searcher, the syndrome or the 
nonzero remainder of the next block of received words is 
read back into the Central Galois Field Processor for con- 
tinual operation. See [Ref. 5] for further details of the 
multiple error correction process. 

If the Central Galois Field Processor operates so 
fast that it is able to compute the error location before 
all of the new received word arrives, then the buffer size 
may be reduced. In general, the buffer is made big enough 
to accommodate the expected worst case for the time to 


es 


ANNEL TO STAGE ИН 


® | E 


“уулнаа” By 4 
D. 1+х+х 2+х3+х 


Figure 6.8 Stage I: The Syndrome Generator 
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compute the locations of the two errors. However, for 
example, suppose that the Central Galois Field Processor is 
able to compute the error locator in half the time required 
for n digits to be received from the channel. In that case, 
the buffer need only be capable of storing 3n/2 digits. 
After a complete word is received, the central processor 
computes its error location by the time the beginning of 
this word is ready to leave the buffer. The error locator 
is then fed into the Chien Searcher, and the central pro- 
cessor sits idle until the rest of the incoming word is 
received. See [Ref. 5] for details. 

Although the above discussion pertains strictly to a 
binary decoder capable of correcting two errors, it can be 
generalized to correct t or fewer errors. By expanding the 
hardware in Stages I and III to accommodate the additional 
Shift register size required by t distinct minimal poly- 
nomials, we are able to implement the decoder with approxi- 
mately the same effectiveness. Likewise, the same procedure 
Of utilizing the product of t distinct minimal polynomials 
would also be used in the design of a multiple- 


error-correction binary encoder. 


D. REED=SOLOMON DECODER 
As with any multiple-error detection and correction 
process, the decoding of RS codes is very complex. As a 


result, the known decoding procedures as discussed by Liu 
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[Ref. 12] will be presented in this section to obtain a re- 
petitive and recursive technique which is suitable for sys- 
tolic array development and eventual VLSI implementation. 
Recall that the information symbols of an RS code are 
treated as the coefficients of the polynomial f(x). If we 


let 
f(x) = Е0+Е1х+...+Ем-1хМ-1 


be the transmitted code vector (where N = codeword length), 


and let 
r(x) = rotr]x+...+ry-1xN71 


be the received code vector over a noisy channel, then the 


error pattern added by the channel is 
e(x) » r(x)-£(x) » eg*tejxt...teyg- 1x71. 


The first step of the decoding procedure is to store the 
received code vector Ej into the buffer register and then 


compute the syndrome Sj using the equation 


М-1 
+1 > +1): 
Sam г (ве ) з r.a. 13 (5) 
J 
1-0 
where 0 < і < 2Е-1. Since вано equation (5) can be 


expressed as 
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крабы 
Зі з (£,*e,) gf zB 
yes 


№1 №1 


È > г (881), > ЭГ 


0] = J=0 


ы 


= Бері + Екі (6) 


In the above equation 


№ І 
3 > (кет) 
Кы Т LB (7) 
j=0 
and 
М- 
E > (ае) 
Ма = E (еј 
JO 


Note that in equation (8) E,,; is the finite field transform 
of the ej's. 


The second step of the decoding procedure is to compute 


cg for 1 « $ « v (Where у = number of errors) using the 
equation 
У D cua END ток Оле ваша 
8-1 | 
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from the syndromes computed in the previous step. This 
can be accomplished using Berlekamp's iterative algorithm or 
Massey's linear feedback shift register (LFSR) synthesis 
algorithm.  [Ref. 12] 

Upon obtaining the o,'s, the third step of the decoding 


procedure is to use the recursive equation 


Кырт T > Eum oO for 2E < 1 < Nel 
іші 
мһеге 
в в: for kei » N 


to compute the remaining Ex,j for 2E < i < Ml. 

After determining the transform of the error pattern 
Есі Жос 0 < kei « N-1, by equation (8), we can then apply 
the inverse transform to Ер,і, to obtain the error pattern 


ез, 156,, 


N-1 
зі -1 > р (Ек 
е5 эм By iB 
k+1=0 
БӘЛЕ) |-0,1,2,...,КМ-1. Then the corrected codeword is 


Obtained by subtracting the error pattern ej from the stored 


code vector rj in the buffer register. 
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In summary, the decoding of an RS code is composed of 
the following five steps: 


l) Compute the syndrome Sj using the equation 


8, E rc erty = > r а 


2) Use Berlekamp's iterative algorithm ОЕ Massey's 
LFSR synthesis algorithm to determine the coefficients 
of the error locator polynomial co, from the Known Sj = 
Ek i for i=0 51572. ЛЕС 


3) Compute the remaining Ek, from the known S, using the 
equation 


V 
B | Е 4 9, 7 6 
851 


for 2E < i < МЕЊА 


4) Compute the inverse transform 
N- 1 
2 =l (N 
- о бк: В 
К-1-0 


to obtain the error pattern, where (М)-1 15 the 
inverse of N. 


5) Subtract the error pattern es from the received code 
vector г: in the buffer memory to obtain the corrected 
codeword. 


Note that in steps 1, 4, and 5, the processing time 


is proportional to МЈ, while in steps 2 and 3 the 
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processing time is proportional to 2E*J and (N-2E)*J, 
respectively. Hence, a natural partition for pipeline 
processing is to divide the decoder system into three pipe- 
line stages. Stage 1 15 used to perform step 1, stage 2 is 
used to perform steps 2-4, and stage 3 is used to perform 
step 5. To obtain а uniform throughput of one decoded 
symbol per symbol clock cycle, each pipeline stage is 
required to complete its computations in М symbol clock 
cycles. As always, the throughput of the system is deter- 
mined by the slowest stage in the pipeline. [Ref. 12] 

The RS decoder architecture using the above pipeline 
decoding technique is shown in Figure 6.12. The timing 
chart of the decoder is shown in Figure 6.13. In both 
figures, note that the first 2E input symbols of the inverse 
ppansform, which are Sg, Si, ..., 92Е-1, can be processed in 
parallel with the Berlekamp/Massey LFSR synthesis algor- 
fenm. The remaining N2E input symbols of the inverse 
transform are obtained from the remaining transform. Each 
of these N2E input symbols is processed by the inverse 
transform circuit immediately after its generation. In 
stage 3, the buffer memory is read out  symbol-by-symbol and 
"Exclusive-OR'ed" with the output of the inverse transform. 
A triple-buffered memory is required to store the three 


active codewords in the pipeline.  [Ref. 12] 
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VII. CONCLUSION 


In this thesis we have taken a modular approach to the 
systolic implementation of a Reed-Solomon encoder and 
decoder. By initially discussing the theory behind systolic 
arrays and finite fields, we have shown how they play ап 
integral part in the overall implementation. The binary 
case 1S presented first because of its simple architecture 
and еазе of understanding. It is then followed by a 
design of a systolic multiplier and ап RS encoder and 
decoder. 

The multiplier requires m basic cells for the finite 
field GF(2HM). Because of its simple-control methodology, 
regular interconnection pattern, and modular structure it is 
highly suited for VLSI implementation. The encoder using 
the systolic multiplier offers the advantage of requiring 
less power, minimal size, and high reliability. The decoder 
being modular in design is also highly suited for a systolic 
architecture, thus the decoding speed can easily be 
increased by using a distributive processing scheme. In 
this way, several decoders can operate in parallel simul- 
taneously, while each individual decoder can operate ina 
Pipeline fashion. 

The design of both the RS encoder and decoder is simple 


and regular. They can be constructed using a systolic array 
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of identical cells with every interconnection path occurring 
between adjacent cells. This makes implementation in VLSI 
extremely attractive since the layout of the cell need only 
be done once and then replicated. 

It is hoped that with this thesis as a guide, an 
interested electrical engineering student could implement 
the encoder or decoder in hardware. By building the four 
cell-binary encoder first, the student would establish a 
firm foundation vital to the development of the more 
complicated RS encoder. This process could then be expanded 
to produce an encoder of eight or sixteen cells, or the more 


general case of 2", 
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